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node of a randomly selected edge, is known. En route to the conditional distribution of edge 
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distributions the expectation values of different quantities have been calculated. Our results provide 
an exact solution not only for infinite, but for finite networks as well. 
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I. INTRODUCTION 



In recent years, the statistical properties of complex networks have been extensively 
investigated by the physics community . With the increasing computing power 

of modern computers, analysis of large-scale networks and databases has become possible. 
It has been shown that the degree statistics of many natural and artificial networks follow 
power law. Examples for such networks vary from social interconnections and scientific 
collaborations to the world-wide web [(J and the Internet 7, sj]. These networks are 
usually referred to as scale-free networks, since the power law distribution indicates that 
there is no characteristic scale in these systems. 

In the early 1960s Erdos and Renyi (ER) introduced random graphs that served as the 
first mathematical model of complex networks 9j. In their model, the number of nodes is 
fixed and connections are established randomly, with probability Per- Although the ER 
model leads to rich theory, it fails to predict the power law distributions observed in scale- 
free networks. Barabasi and Albert (BA) proposed a more suitable evolving model of these 



networks 
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11 1 . The BA model is also based on the random graph theory, but involves two 



key principles in addition: (a) growth, that is, the size of the network is increasing during 
development, and (b) preferential attachment, that is, new network elements are connected 
to higher degree nodes with higher probability. In the BA model every new node connects 
to the core network with a fixed number of links m. 

The study of complex networks usually deals with the structural properties of networks, 
like de gree distribution jl^ . shortest path distribution degree-degree correlations, clus- 



tering 



141 ]. etc. For complex networks which involve a transport mechanism betweenness is 



the matter of importance. Roughly speaking, betweenness is the number of shortest paths 
passing through a certain network element. For example, in communication networks in- 
formation flows between remote hosts via intermediate stations and in the Internet data 
packets are transmitted through routers and cables. The expected traffic flowing through 
a link or a router is proportional to the particular edge or node betweenness, respectively. 
News and rumors spread in social networks, and node betweenness measures the importance 
or centrality of an individual in society. 



Node betweenness has been studied recently by 



Goh. Ka.lmu. and Kim] fis| . who argued 



that it follows power law in scale-free networks, and the exponent 5 ~ 2.2 is indepen- 
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dent from the degree distribution in a certain range. ISzabo. Alava. and Kertesa 13| used 
rooted deterministic trees to model scale-free trees, and have fou nd scaling exponent St = 2. 



The same scaling exponent has been found experimentally by Goh. Kahng. and Kiml for 
scale- free trees. The rigorous proof of the heuristic results of [13J] has been provided by 
Bollobas and Riorda 3 in Q. 



Until recently, less attention has been paid to edge betweenness, even though edge be- 
tweenness is often essential for estimating the load on links in complex networks. For exam- 
ple, the edge betweenness can measure the "importance" of relationships in social networks, 
or it can measure the expected amount of data flow on links in computer networks. The 
probability distribution of edge betweenness gives a rough statistical description of links and 
it characterizes the network as a whole. Therefore, it is an important tool for an overall 
description of links in complex networks. 

In some cases, a local property of the network is known as well. For instance, if the number 
of friends of any individual can be counted, then it is reasonable to ask the "importance" of a 
relationship (i.e. an edge in a social network) under the condition that the number of friends 
of the related individuals is known. In this case, the conditional probability distribution of 
edge betweenness provides a much finer description of links than the total distribution. 

In this paper we focus on how additional local information could be used to describe 
links. In particular, we aim at deriving the probability distribution of edge betweenness in 
evolving scale-free trees, under the condition that the in-degree of the "younger" node of 
any randomly selected link is known. For the sake of simplicity we consider the in-degree 
of the "younger" node only. Whether a node is "younger" than another node or not can 
be defined uniquely in evolving networks, since nodes attach to the network sequentially. 
Note that the in-degree is considered instead of total degree for practical reasons only. The 
construction of the network implies that the in-degree is less than the total degree by one 
for every "younger" node. 

To obtain the desired conditional distribution we calculate the exact joint distribution of 
cluster size and in-degree for a specific link first. Then, the joint distribution of a randomly 
selected link is derived, which is comparable with the edge ensemble statistics obtained from 
a network realization. The exact marginal distributions of cluster size and in-degree follow 
next. After that, we give the distribution and mean of cluster size under the condition that 
in-degree is known. For the sake of completeness the conditional in-degree distribution is 
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presented as well. Finally, the distribution and mean of edge betweenness is derived under 
the condition that the corresponding in-degree is known. Note that all of our analytic results 
are exact even for finite networks, which is valuable since the size of the real networks are 
often much smaller than the valid range of asymptotic formulae. Moreover, exact results for 
unbounded networks are provided as well. 

As a model of evolving scale- free trees we consider the BA model with parameter m = 1, 



extended with initial attractiveness 



13, 



181 ] . With the initial attractiveness the scaling 



properties of the network can be finely tuned. Note, that in the limit of initial attractiveness 
to infinity the preferential attachment disappears, and new nodes are connected to the old 
ones with uniform probability. In this limit the network looses its scale-free nature and 
becomes similar to an ER network with per = 2/N. Therefore, scale-free and non-scale free 
networks can be compared within one model. For the sake of simplicity the infinite limit of 
initial attractiveness is referred to as the "ER limit" throughout this paper. 

We restrict our model to trees, that is to connected loopless graphs. The simplicity of 
trees allows analytic results for edge betweenness, since the shortest paths in trees are unique 
between any pair of nodes. Although trees are special graphs, a number of real networks 
can be modelled by trees or by tree-like graphs with only a negligible number of shortcuts. 
Important examples of such networks are the Autonomous Systems in the Internet [3]. 

The rest of this paper is organized as follows: In Section [Til a short introduction to the 
construction of BA trees is given. Then, a master equation for the joint distribution of clus- 
ter size and in-degree of a specific edge is derived and solved in Section [TTT] and Section HVl 
respectively. The total joint distribution of cluster size is calculated in Section [V] The 
marginal and conditional distributions of cluster size and in-degree are derived in Section IVT1 
and Section [VlTJ respectively. In Section IVlIU the conditional distribution of edge between- 
ness follows. Finally, we conclude our work and discuss future directions in Section ITXl 



II. THE NETWORK MODEL 



The concepts of graph theory are used throughout this paper. A graph consists of vertices 
(nodes) and edges (links). Edges are ordered or un-ordered pairs of vertices, depending on 
whether an ordered or un-ordered graph is considered, respectively. The order of a graph 
is the number of vertices it holds, while the degree of a vertex counts the number of edges 
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adjacent to it. Path is also defined in the most natural way: it is a vertex sequence, where 
any two consecutive elements form an edge. A path is called a simple path if none of the 
vertices in the path are repeated. Any two vertices in a tree can be connected by a unique 
simple path. The graph is called connected if for any vertex pair there exists a path which 
starts from one vertex and ends at the other. 

The construction of the network proceeds in discrete time steps. Let us denote time with 
r G N, and the developed graph with G T = (V T ,E T ), where V T and E T denote the set of 
vertices and the set of edges at time step r, respectively. Initially, at r = 0, the graph 
consists only of a single vertex without any edges. Then, in every time step, a new vertex 
is connected to the network with a single edge. The edge is directed, which emphasize that 
the two sides of the edge are not symmetric. The newly connected node, which is the source 
of the edge, is always "younger" than the target node. The term "younger node of a link" 
is used in this sense below. Note that the initial vertex is different from all the others, since 
it has only incoming connections; we refer to it as the root vertex. 

The target of every new edge is selected randomly from the present vertices of the graph. 
The probability that a new vertex connects to an old one is proportional to the attractiveness 
of the old vertex v, defined as 

A(v)=a + q, (1) 
where parameter a > denotes the initial attractiveness and q is the in-degree of ver- 



tex v. It has been shown in 



1* 



that the in-degree distribution is asymptotically 



^ (1 + a) r p? 4 ^ (q + a) ^ 2+a \ We will improve this result and derive the exact in- 

1 [a) 

degree distribution below. Note that in the special case a = the attractiveness of every 
node is zero except of the root vertex. It follows that every new vertex is connected to the 
initial vertex in this case, which corresponds to a star topology. The special case a = 1 
practically returns the original BA model. Indeed, except for the root vertex, the attrac- 
tiveness of every vertex becomes equal to its degree if a = 1; this is exactly the definition 
of the attractiveness in the BA model [loj ]. Finally, if a — > oo, then preferential attachment 
disappears in the limit, and the model tends to a Poisson-type graph, similar to an ER 
graph. 

The attractiveness of sub-graph S is the sum of the attractiveness of its elements: 

A(S) = J2Mv'). (2) 

v'eS 



FIG. 1: (Color online) Schematic illustration of the evolving network at time r. Vertex v, connected 
to the network at r e , denotes the root of cluster C. Variables q and n = \C\ — 1 denote the in-degree 
of vertex v and the number of nodes in C without v (marked by circles), respectively. 

We refer to a connected sub-graph as a cluster. The attractiveness of cluster C can be given 
easily: 



where \C\ denotes the size of the cluster. It is obvious that the overall attractiveness of the 
network at time step r is 



III. MASTER EQUATION FOR THE JOINT DISTRIBUTION OF CLUSTER 
SIZE AND IN-DEGREE 

Let us consider the size of the network N, an arbitrary edge e, which connected vertex 
v to the graph at time step r e > 0, and let us denote by C the cluster which has developed 
on vertex v until r > r e (Fig. [T]). The calculation of betweenness of the given edge is 
straightforward in trees, since the number of shortest paths going through the given edge, 
that is the betweenness of the edge, is obviously L = \C\ (N — |C|). Therefore, it is sufficient 
to know the size of the cluster on the particular edge to get edge betweenness. 

The development of cluster C can be regarded as a Markov process. The states of the 
cluster are indexed by (n,q), where n = \C\ — 1 denotes the number of vertices in cluster 
C without v. The in-degree of vertex v is denoted by q. Transition probabilities can be 



A(C) = (l + o) \C\ - 1, 



(3) 



A(V T ) = (1 + a) (r + l)-l. 



(4) 
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obtained from the definition of preferential attachment: 

A (C T \ v) n-aq 

^ A(y T ) r+l-a' W 

where a = 1/ (1 + a) G ]0, 1] and W T ^ q denotes the transition probability (n, q) — > (n + 1, g), 
and W T q denotes the transition probability (n, q) — > (n + 1, q + 1), respectively. 

The Master-equation, which describes the Markov process, follows from the fact that 
cluster C can develop to state (n, q) obviously in three ways: a new vertex can be connected 

1. to cluster C but not to vertex v, and the cluster was in state (n — 1, q), 

2. to vertex v, and the cluster was in state (n — 1, q — 1), or 

3. to the rest of the network, and the cluster was in state (n, q). 

Therefore, the conditional probability P T (n, g | r e ) that the developed cluster on edge e is in 
state (n, g) satisfies the following Master-equation: 

F T (n,q | r e ) = WV_i, n -i,,P r -i(n - l,g | r e ) (7) 
+ W;_ 1) ,_ 1 P T _i(n-l,g-l |r e ) 
+ [1 - W r _i, nj9 - P T _i(n, g | r e ), 

Since the process starts with n = 0, g = at r = r e , the initial condition of the above 
Master equation is P Te (n, g | r e ) = 5 n ,o^g,o, where 5ij is the Kronecker-delta symbol. 

IV. THE SOLUTION OF THE MASTER EQUATION 

After substituting the above transition probabilities into ([7j), the following first order 
linear partial difference equation is obtained: 

(r - a) F T (n, q | r e ) = (n - 1 - ag) P r _i(n - 1, g | r e ) 

+ (ag + 1 - 2a) P T _i(n - 1, g - 1 | r e ) 

+ (r-n-l)P T _i(n,g|r e ), (8) 
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Let us seek a particular solution of (|HJ) in product form: f{r)g{n) h(q). The following 
equation is obtained after substituting the probe function into (jHJ): 

/ \ f( T ) t * ^9{n-l) ( , g(n - 1) h(q - 1) 

v ~~ a ) Ti V — r = \ n — l — aq) — — n — 1 + (aq + 1 — 2a ) 



f(r- 1) 1 g{n) g(n) h(q) 

The above partial difference equation can be separated into a system of three ordinary 

difference equations. The solutions of the separated equations are: 

/lj r(r-a + l)' [) 
T(n + X 2 ) 

9{n) = T(n + X 1 + iy (10) 

h{q) = r / g+ y / a ~ 1) v (11) 
W r(g + A 2 /a + l)' 1 J 

where Ai and A 2 are separation parameters. 

The solution of ([7j), which fulfils the initial conditions, is constructed from the linear 
combination of the above particular solutions: 

K(n, q\r e )=J2 Cx lM /(r) g(n) h(q), (12) 

Ai,A 2 

where C\ u \ 2 coefficients are independent of r, n and q. 

To obtain coefficients C\ lt x 2 , the initial condition of ([7]) is expanded on the bases of g(n) 
and h(q). The detailed calculation is presented in Appendix 1X1 

The solution of ([7]) is 

Pt( .„, $ i r e ) = r r ( (T r r ;- 1 1 ' w r(T = = i r( « :; /a :/> •.<», ,> w 

r(r e ) T(n + 1) r(r — r e — n + 1) r(r + 1 — a) r(l/a — 1) 

where § a (n,q) = Xlfc=o fc!(g-fc)! (~ a ^)n an< ^ ( x )n = + x)/T(x) denotes Pochhammer's 
symbol. Note that P T (n, q | r e ) ^ iff < q < n < t — r e . The conditions < q 
and n < t — r e are obvious, since l/r(/c) = by definition if A; is a negative integer or 
zero. Furthermore, the condition q < n can be easily seen if $ a (n, q) is transformed into 
the following equivalent form: $ a (n, q) = ^ j^:Z n ~ l (1 — z~ a ) q \ z _ l . This result coincides 
with the fact that the size of a cluster n cannot be less than the corresponding number of 
in-degrees q. 



V. JOINT DISTRIBUTION OF CLUSTER SIZE AND IN-DEGREE 

Equation (fl3|) provides the conditional probability that a particular edge which was con- 
nected to the network at r e is in state (n, q) at r > r e . In a fully developed network, however, 
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the time when a particular edge is connected to the network is usually not known. Moreover, 
the development of an individual link is usually not as important as the properties of the 
finally developed link ensemble. Therefore, we are more interested in the total probability 
F T (n,q), that is the probability that a randomly selected edge is in state (n, q) at r, than 
the conditional probability (1131) . The total probability can be calculated with the help of 
the total probability theorem: 

T 

P T (n, q) = p >, q I r e ) P T (r e ), (14) 

r e =l 

where P T (r e ) is the probability that a randomly selected edge was included into the network 
at r e . According to the construction of the network one edge is added to the network at 
every time step, therefore P r (r e ) = 1/r. The following formula can be obtained after the 
above summation has been carried out: 

Pr(n, q) = a )' , ,q S (n, q), (15) 

r (2 - a) n+1 

where < a < 1. In star topology, that is when a — 1, the joint distribution P T (n, q) 
evidently degenerates to ¥ T (n,q) = S nt0 S qt0 . 

The ER limit of joint distribution can be obtained via the a — > limit of ffl5l) (see 
Appendix [B] for details) : 

lim P T (n, ?) = — y (16) 
o^o r fc ^ 1 J T(n + 3) 1 J 

where < q < n < r and 51"^ denote the Stirling numbers of the first kind. Note, that for 
the special case n = q = the ER limit is lim a ^ Pt(0, 0) = 

The above formulae have been verified by extensive numerical simulations. The joint 
empirical cluster size and in-degree distribution has been compared with the analytic formula 



( TT5]) for a = 1/2 in Fig [21 Subfigures 2(a) and 2(b) represent intersections of the joint 



distribution with cutting planes of fixed in-degrees and cluster sizes, respectively. The 
figures confirm that the empirical distributions, obtained as relative frequencies of links 
with cluster size n and in-degree q in 100 network realizations, are in complete agreement 
with the derived analytic results. 

Equation (fl5|) is the fundamental result of this section. The derived distribution is exact 
for any finite value of r, that is for any finite BA trees. This result is precious for modeling 




1 10 100 

Cluster size, n 



(a) Joint distribution of cluster size and in-degree as 
the function of cluster size. 




In-degrcc, q 



(b) Joint distribution of cluster size and in-degree as 
the function of in-degree. 

FIG. 2: (Color online) Joint empirical distribution of cluster size and in-degree at a = 1/2 (sym- 
bols), and analytic formula (|15p (solid lines) are compared on double-logarithmic plot. Simulation 
results have been obtained from 100 realizations of 10 5 size networks. 

a number of real networks, where the size of the network is small, compared to the relevant 
range of cluster size or in-degree. If the size of the network is much larger than the relevant 
range of cluster size or in-degree, then it is practical to consider the network as infinitely 
large, that is to take the r — > oo limit. For the above joint distributions (fl5|) and (fl6|) the 
t — > oo limit is evident, since the r dependent prefactors obviously tend to 1 if the size of 
the networks grow beyond every limit. 
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VI. MARGINAL DISTRIBUTIONS OF CLUSTER SIZE AND IN-DEGREE 



We have derived the joint probability distribution of the cluster size and the in-degree in 
the previous section. In many cases it is sufficient to know the probability distribution of 
only one random variable, since the information on the other variable is either unavailable 
or not needed. It is also possible that the one dimensional distribution is needed especially, 
for example, for the calculation of a conditional distribution in Section IVII1 

The one dimensional (marginal) distributions P T (rz) and F T (q) can be obtained from joint 
distribution F T (n,q) as follows: 

n r— 1 

P T (n) = J> T (n,g), F T (q) = £> T (n,g). 

q=0 n=q 

After substituting ffl5|) into the above formulae the following expressions are obtained: 

™ , x t + 1 — a 1 — a , 

WJn) = — r-. r. 17 

if < n < t and F T (n) = if n > r. Furthermore, 

P ( ^ T + l-a l (V"- 1 )^ r+l-q(l/a-l) g ^ (-1)" (-ak) T 

Aq) r a(g+l/a-l) 1/a+1 r (2 - a) T f^ Q k\ (q - k)\ ak + 2 - a 

(18) 

if < q < t and ¥ T (q) = otherwise. Rice's method has been applied to evaluate the 
first term of P r (g) in closed form. 

The ER limit of the marginal cluster size distribution can obviously be obtained from 
( TT71) at a — 0. Furthermore, the ER limit of the marginal in-degree distribution can be 
derived analogously to the limit of the joint distribution, shown in Appendix iBl 

t + 1 1 t + 1 1 d*- 1 (l + a 



lim P r (g) 



't-1 



(19) 

a=0 



c^o r 29+ 1 r r(r + 2)T(q) da^ 1 2- a 

If the size of the network grows beyond every limit, that is if r — > oo, then the marginal 
distributions become much simpler: 

PooH = 7 r (20) 

v ' (n+l-a) (n + 2-a) v ' 

1 (l/a-l) u 

a {q + 1/a - l) 1/a+1 
limP 0O (g) = 2-«- 1 . (22) 

a— >0 
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Cluster size, n 



s exponentially 

a. 



FIG. 3: (Color online) Figure shows comparison of empirical CCDFs of cluster size distributions 
(points) with analytic formula (|24p (lines) on logarithmic plots, at a = 0, and 1/2. Empirical 
distributions have been obtained from 10 realizations of N = 10 6 size networks. 

The asymptotic behavior of the cluster size and in-degree distributions differ significantly. 
The tail of the cluster size distribution follows power law with exponent 2 either in BA or ER 
network, independently of a. However, we learned that the tail of the in-degree distribution 
follows power law with exponent l/a + l = 2 + ain BA networks, and it fal 
in ER topology, which agree with the well known results of previous works 

It is worth noting that the mean cluster size diverges logarithmically as the size of the 
network tends to infinity: E T {n} = YlliZo n ^ >T ( n ) = (1 — «) In r + O (1). The expectation 
value of the in-degree, however, obviously remains finite: E T {q} = < 1, and Eoo {q} = 1 
if the size of the network is infinite. Moreover, the standard error of the in-degree can be 
also given exactly when the size of the network grows beyond every limit: 

E 00 {(g-1) 2 } = ^^. (23) 

This result implies that the fluctuations of the in-degree diverge in a boundless network, if 
a = 1/2, that is in the classical BA model. 

Our analytic results have been verified with computer simulations. Since cumulative 
distributions are more suitable to be compared with simulations than ordinary distributions, 
we matched the corresponding complementary cumulative distribution functions (CCDF) 
against simulation data. The CCDF of cluster size, F£(n) = Y^=n^ T i n ') can ^ e calculated 
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FIG. 4: (Color online) Figure shows comparison of empirical CCDFs of in-degree distributions 
(points) with analytic formula (f25j) (lines) on logarithmic plots, at a = 0, 1/3, 1/2, and 2/3. 
Empirical distributions have been obtained from 10 realizations of N = 10 6 size networks. Inset: 
Comparison at a = on semi-logarithmic plot. 



straightforwardly: 



^„ /N r + 1 — a 1 — a 1 — a , . 

F C T n = , 24 



where < n < r and < a < 1. The CCDF of in-degree, = ^^=5^(9') is more 

complex, however: 

r + 1 -a (l/a-l) 1/a l- a 



t (q + 1/a - l) 1/a r 



/a 

r + 1 - a (1/a - 1), ^ (-l) fc (1 - a - afc),^ 

r (2-a) T ^Jfe!(g-2-ib)!(A;+l/a)(A; + 2/a) 1 J 

where < q < r and < a < 1. If the size of the network grows beyond every limit, then 
the CCDFs are the following: 

K(n) = = ■ (26) 

n + l-a {q + l/a - l) 1/a 

where < n, < q and < a < 1. 

Comparison of analytic CCDF of cluster size ( |24l) and empirical distributions are shown 
in Figure [3] for a = 0, 1/3, 1/2, and 2/3. Experimental data has been collected from 10 
realizations of 10 6 node networks. Figure [3] shows that simulations fully confirm our analytic 
result. 
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On Figure H] analytic formula (I25I) and the empirical CCDFs of in-degree, obtained from 
the same 10 6 node realizations, are compared. Note the precise match of the simulation and 
the theoretical distribution on almost the whole data range. Some small discrepancy can be 
observed around the low probability events. This deviation is caused by the aggregation of 
errors on the cumulative distribution when some rare event occurs in a finite network. 

VII. CONDITIONAL PROBABILITIES AND EXPECTATION VALUES 

In the previous sections exact joint and marginal distributions of cluster size and in-degree 
have been analyzed for both finite and infinite networks. All these distributions provide 
general statistics of the network. In this section we proceed further, and we investigate the 
scenario when the "younger" in-degree of a randomly selected link is known. We ask the 
cluster size distribution under this condition, that is the conditional distribution F T (n | q). 
The results of the previous sections are referred to below to obtain the conditional probability 
distribution, and eventually the conditional expectation of cluster size. For the sake of 
completeness, the conditional distribution and expectation of in-degree are given as well at 
the end of this section. 

The conditional cluster size distribution can be given by the quotient of the joint and the 
marginal in-degree distributions by definition: 



The exact conditional distribution for any finite network can be obtained after substituting 
(|T5l) and (|T8l) into the above expression. For a boundless network the conditional distribution 
takes the simpler form: 



where < q < n. If n 3> 1, then P^n | q) ~ a (2/a — 1)„ +1 /n 3 + O (1/n 4 ), that is the 
conditional cluster size distribution falls faster than the ordinary cluster size distribution. 
It follows that the mean of the conditional cluster size distribution will not diverge like the 
mean of the ordinary distribution. 

What is the expected size of a cluster under the condition that the in-degree of its root 
is known? For practical reasons, we do not calculate E r {n \ q} directly, but we calculate 



F T (n | q) 




(27) 




(28) 
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E T {n + 2 — a \ q} = E r {n \ g} + 2 — a instead: 



1 T_1 

T {n + 2-a\q} = jjr ^y ^ (n + 2 - a) P T (n, g). (29) 



n=q 



Since (n + 2 — a) P T (n, g) = r+1 a ^ 9 ^a(^ ; g), the above summation can be given sim- 
ilarly to the marginal distribution P T (g) in (fT8l) : 

r — 1 

E Nm , , r + 1 — a 1/a — 1 
n + 2-a P T (n,g = f- 
r g + 1/a — 1 

-rx 1 _ /t/ ( 1 /rv — 1 / ( 



r + l-a(l/«-l) ? A (-l) fc (-afc), 



r (2 - a) r _ 1 ^ k\ (q - k) \ ak + 1 - a 

After replacing the above sum in E T {n | g}, the following equation can be obtained: 

(q + 1/a), ,„ 

E r {n + 2 - a | g} = (1 - a) " . i ' G T (q), (30) 

(l/a-l) 1/Q 



where 



(l-a) ^Jfe!(g-A;)!A:+l/a-l 

G r (g) = fc=0 , . (31) 

(2/a-l) « (_!)* (- afc ) T 

( 2 -«)r ^A;!(g-A;)!fc + 2/a-l 
The identity lim T ^oo G T (q) = 1 implies that G T (q) involves the finite scale effects, and the 
factors preceding G T (q) give the asymptotic form of E r {n + 2 — a | g}: 

Eoo {n + 2 - a | g} = (1 - a) (32) 

(l/a-l) 1/a 

It can be seen that the expectation of cluster size, under the condition that the in-degree 
is known, is finite in an unbounded network. It stands in contrast to the unconditional 
cluster size, discussed in the previous section, which diverges logarithmically as the size of 
the network grows beyond every limit. 

In the ER limit, the expected conditional cluster size becomes 

limEoo {n + 2 | g} = 2 q+1 . (33) 

The fundamental difference between the scale-free and non-scale-free networks can be ob- 
served again. In the scale-free case the expected conditional cluster size asymptotically 
grows with the in-degree to the power of 1/a, while in the later case it grows exponentially. 
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On Figure [5] the exact analytic formula (|30|) is compared with simulation results at a = 0, 
1/3, 1/2, and 2/3. The simulations clearly justify our analytic solution. 

Let us investigate shortly the opposite scenario, that is when the cluster size is known and 
the statistics of the in-degree under this condition is sought. The conditional distribution 
can be obtained from the combination of Eqs. (j!5p . (j!7|) and the definition 

p ^ i ») = W' (34) 

The conditional expectation of in-degree can be acquired by the same technique as the 
conditional expectation of cluster size. Let us calculate E T {q + 1/a — 1 | n} = E T {q \ n} + 
1/a — 1 instead of E T {q \ n} directly: 

1 n 

E r {q + 1/a - 1 | n} = — - £ (<? + 1/a - 1) P T (n, q) 

= ^^(n + l-a) a , (35) 
a 

where < n < r. Note, that the conditional expectation of in-degree is independent of r, 
that is of the size of the network. In the ER limit the expectation of the in-degree becomes 

lim E T {q \ n} = \I/(n + 1) + 7, (36) 

where *&(x) = -4-hxT(x) denotes the digamma function, and 7 = — ^(1) ~ 0.5772 is the 
Euler-Mascheroni constant. Asymptotically the expectation of the in-degree in a scale- 
free tree grows with the cluster size to the power of a, while in a ER tree it grows only 
logarithmically, since ^f(n + 1) = logra + 0(l/n). Therefore, conditional in-degree and 
conditional cluster size are mutually inverses asymptotically. Figure [6] shows the analytic 
solution ( |35l) and simulation data at a = 0, 1/3, 1/2, and 2/3 parameter values. Simulation 
data has been collected from 100 realizations of 10 5 size networks. 



VIII. CONDITIONAL DISTRIBUTION OF EDGE BETWEENNESS 

Using the results of the previous sections, we are finally ready to answer the problem which 
motivated our work, that is the distribution of the edge betweenness under the condition 
that the in-degree of the "younger" node of the link is known. It has been noted at the 
beginning of Section [TlTl that the edge betweenness can be expressed with cluster size: 

L = (n + 1) (r - n) . (37) 
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FIG. 5: (Color online) Figure shows the average cluster size as the function of the in-degree q, 
obtained from 100 realizations of 10 5 size networks. Simulation data has been collected at a = 0, 
1/3, 1/2, and 2/3 parameter values. Analytical result ([30]) of conditional expectation E r {n \ q} is 
shown with continuous lines. 
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FIG. 6: (Color online) Figure shows the average in-degree as the function of the cluster size n, 
obtained from 100 realizations of 10 5 size networks. Simulation data has been collected at a = 0, 
1/3, 1/2, and 2/3 parameter values. Analytical result ([35]) of conditional expectation E r {q \ n} is 
shown with continuous lines. 

Therefore, conditional edge betweenness can be given formally by the following transforma- 
tion of random variable n: 

T-l 

P T (L | q) = ^,(n+i)(T- n )Pr(n | q). (38) 

n=0 
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Obviously, P T (L | g) is non-zero only at those values of L, where (1371) has integer solution 
for n. If 

r-1 /(r + 1) 2 , s 

" £= — "V 4 ^ (39) 
is such an integer solution of the quadratic equation ( 1371) . and L^(r + 1) 2 /4, then 

P T (L | g) = F T {n L \ q) + P T (r -l-n L \q). (40) 

If L = (r + l) 2 /4 is integer, then P r (L | g) = P r (n L | g). 

The conditional expectation of edge betweenness can be obtained from ( 1371 ) : 

E r {L | g} = rE T {n + 1 | g} - E T {(n + 1) n \ q] . (41) 

Therefore, for the exact calculation of E r {L | g} the first and the second moment of the 
conditional cluster size distribution are required. The first moment, that is the mean, has 
been derived in the previous section. In order to calculate the second moment let us use the 
technique we have developed in the previous sections. Let us consider: 

E T {(n + 2-a)(n + l-a)\q} = — ^ (2 _ ^ ■ (42) 

We shall be cautious when the summation for n is evaluated. The = 1 term in $ a (n, g) = 
Sfc=o ( — a ^)n mus t be treated separately to avoid a divergent term: 

T-l 



n=q 



1 1 (-I)" (-afc) T 

«(2-«) r _ 2 ^A;!(g-A;)! 

The exact formula for E T {L | g} can be obtained straightforwardly, after (130]) and the above 
expressions have been substituted into (14T]) . 

Let us consider the scenario when the size of the network tends to infinity. Equation ( |37l) 
implies that edge betweenness diverges as r —>■ oo, therefore L should be rescaled for an 
infinite network. From the asymptotics of the digamma function \l/(r — a) — lnr + O (1/r) 
it follows that E T {(n + 2 — a) (n + 1 — a) \ q} grows only logarithmically, slower than the 
linear growth of rE r {n + 2 — a \ q}. Therefore, edge betweenness asymptotically grows 
linearly as the size of the network grows beyond every limit. Let us rescale edge betweenness 

A T = m (43) 
r + 1 

18 



and let us consider the limit A = Hindoo A T = ua + 1. The CCDF of the rescaled 
edge betweenness can be given by F^(A | q) = lim r ^oo ^I=i _nAT P T (^ I q) = 



p Jnj S^La-i ^00(^1 <?)• When the summation has been carried out, the following equation 
is obtained: 

where q + 1 < A. If 1 < q <C A, then only the first term of the sum should be taken into 
account, and it is easy to see that 

™\') = m$^)i!? +0 W^- (45) 

It can be seen that the scaling exponent —2 is independent of a. The above asymptotic 
formula has been obtained for infinite networks. The same power law scaling can be observed 
in finite size networks as f|4"5l) if A T <C r. However, F^(A T | q) = if A r > r in finite networks, 
therefore asymptotic formula (|45|) evidently becomes invalid if A r « r. 

It is obvious that as the size of the network grows larger and larger, asymptotic formula 
(|44p becomes more and more accurate. One can ask how fast the convergence is. From 
elementary estimations of F°(A T | q) one can show that for fixed A r : 

n? (1- nA 1 

F C AK I q) = F^(A T I q) - (1 - F^(A T \ q)) { - > - + O (l/r 2+ «) , (46) 

that is corrections to the asymptotic formula decrease with r~ 2 for large r. 

On Figure [7] comparison of analytic formula (j44p with simulation results is presented for 
q = 1 and q = 2. The empirical CCDF of rescaled edge betweenness, under the condition 
that in-degree q is known, is shown for 10 4 , 10 5 , and 10 6 size networks, at a = 1/2 parameter 
value. The empirical CCDFs of rescaled edge betweenness evidently collapse to the same 
curve for different size networks, and they coincide precisely with our analytic result. 

The expectation of the rescaled edge betweenness under the condition that in-degree q is 
known can be given by Eoo {A | q} = Eoo {n\ + 1 | q}. Using ( |32i) and ( 1331) we get 



(9 + 1 /a)i/a 

l/a 

lim {A I q} = 2 q+1 - 1. (48) 



IU T J-/ U 1 /„ 

E^ {A I q} = (1 - a) 7 1/a - 1 + a, (47) 

(l/a — l) n 



One can see that Eoo {A | q] ~ g 1 / for g ^> 1 if a > and Eoo {A | q] ~ e 9 for g ^> 1 if 
a -»• 0. 
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FIG. 7: (Color online) Figure shows CCDF of edge betweenness under the condition that the 
in-degree q is known. Empirical CCDF has been obtained from 100 realizations of N = 10 4 and 
N = 10 5 , and 10 realizations of N = 10 6 size networks at a = 1/2 parameter value. Continuous 
lines show analytic result (|44p . 

Analytic results (l4"7j) and (j4"Hl) . and simulation data are shown in Figure [H] at a = 1/2 and 
a = parameter values. Numerical data has been collected from the same 10 4 , 10 5 , and 
10 6 size networks as above. As the size of the network grows, a larger and larger range of 
the rescaled empirical data collapses to the same analytic curve. On the high degree region 
some discrepancy can be observed due to the finite scale effects. 

Finally, let us note that the precise unconditional distribution of edge betweenness 
P T (L) = J2nZo ^L,(n+i)(r-n)IPr(^) can be obtained from (fTTl) as well. Furthermore, CCDF of 
the unconditional betweenness F^(L) = J^^=^ L 1 Pr(^) can be derived in closed form: 



For the sake of simplicity we have assumed during our calculations that in-degrees of the 
"younger" nodes are provided. However, it is possible that even though both two in-degrees 
of every link are known, we cannot distinguish them from each other, that is we cannot tell 
which is the "younger" node. How could we extend our results to this scenario? Let us 
consider a new edge when it is connected to the network. The in-degree of the new node is 
obviously 0. The in-degree of the other node, which the new node is connected to, is equal 
to or larger than one. Due to preferential attachment the larger the in-degree is the faster 
it grows. Even if preferential attachment is absent, the growth rate of every in-degree is 




(49) 
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FIG. 8: (Color online) Figure shows average edge betweenness under the condition that the in- 
degree q is known as the function of q on log-log plot. Numerical data has been collected from 100 
realizations of N = 10 4 and N = 10 5 , and 10 realizations of N = 10 6 size networks at a = 1/2 
parameter value. Inset shows the same scenario at a = parameter value on semi- logarithmic 
plot. Continuous lines show analytic results (f47|) and (j48|) . 

the same. Therefore, it is expected that the initial deficit in the in-degree of the "younger" 
node grows or remains at the same level during the evolution of the network. It follows that 
it is a reasonable approximation to substitute the in-degree of the "younger" node q with 
q min = min(g 1; q 2 ) in our formulae. 

IX. CONCLUSIONS 

A typical network construction problem is to design network infrastructure without wast- 
ing precious resources at places where not needed. An appropriate design strategy is if 
network resources are allocated proportionally to the expected traffic. In a mean field 
approximation the expected traffic is proportional to the number of shortest paths going 
through a certain network element, that is the betweenness. 

The precise calculation of all the betweenness require complete information on the network 
structure. In real life, however, the number of shortest paths is often impossible to tell 
because the structure of the network is not fully known. One of the practical results of this 
paper is that the expectation of edge betweenness can be estimated precisely when a limited 
local information on network structure — the in-degree of the "younger" node — is available. 
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Another difficulty of network design is that the size of real networks is finite. Moreover, 
the size of real networks is often so small that asymptotic formulas can be applied only with 
unacceptable error. The other important novelty of our results is that the derived formulas 
are exact even for finite networks, which allows better design of realistic finite size networks. 

Various statistical properties of evolving random trees have been investigated in this 
paper. We have focused on the cluster size, the in-degree and the edge betweenness. We 
have considered the m — 1 case of the BA model extended with initial attractiveness for 
modeling random trees. Initial attractiveness allows fine tuning of the scaling parameter. 
Moreover, in the limit of the tuning parameter a — > the applied model tends to a non- 
scale-free structure, which is in many aspects similar to the classical ER model. Therefore, 
we were able to investigate both the scale-free and the non-scale-free scenario within the 
same framework. 

First, the evolution of cluster size and in-degree of a specific edge have been modeled as a 
bivariate Markov process. The master equation, associated with the Markov process, has led 
us to a linear partial difference equation. An exact analytic solution of the master equation, 
which satisfies the initial conditions as well, has been found. The solution provides the joint 
probability distribution of cluster size and in-degree for a specific edge. 

Using the above results we have derived the joint probability distribution of cluster size 
and in-degree for a randomly selected edge. It is of more practical importance than the joint 
distribution for a specific edge because, in contrast to the former distribution, it provides 
the statistical description of the whole network. We also derived the joint distribution in 
the ER limit. Note that the obtained formulae are exact for even finite size networks. In 
addition, the formulae for unbounded networks have been presented as well. 

We have continued our analysis with the one dimensional marginal distributions. We have 
shown some fundamental differences in the scaling properties of the marginal cluster size 
and in-degree distributions. The novelty of our results here, compared to previous results 
in the literature, is that we have found exact analytic formulae not only for the large, but 
also for the small cluster size and in-degree region. 

Although the marginal distributions have their own importance, we have derived them 
in order to obtain conditional probability distributions. From the combination of the joint 
and the marginal distributions we have given the conditional distributions of cluster size 
and in-degree. We have also presented conditional expectations of cluster size and in-degree 
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for both finite and unbounded networks. We have found that asymptotically the conditional 
cluster size grows with in-degree to the power of 1/at and the conditional in-degree grows 
with cluster size to the power of a, respectively. The ER limit has been discussed as well. 
We have shown that the conditional cluster size grows exponentially and the conditional 
in-degree grows logarithmically when a — > 0. 

Finally, by applying the transformation of random variables we have derived the distri- 
bution of edge betweenness under the condition that the corresponding in-degree is known. 
We have found that the conditional expectation of edge betweenness grows linearly with the 
size of the network. For the analysis of unbounded networks we have defined the rescaled 
edge betweenness A, and derived its distribution and expectation under the condition that 
in-degree q is provided. Our analytic results have been verified at different network sizes and 
parameter values by extensive numerical simulations. We have demonstrated that numerical 
simulations fully confirm our analytic results. 

For the future, we hope that the methods we have developed in this paper allow us to 
describe cluster size and edge betweenness in more general scenarios. For example, when 
not only the "younger" , but both two in-degrees of links are considered. 
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APPENDIX A: EXPANSION OF THE KRONECKER-DELTA FUNCTION 

We have seen that the general solution of Eq. ([7]) is F T (n,q j r e ) = 
a 2 C\ lt \ 2 f {j)g{n)h(q) , an d the initial condition is F Te (n,q \ r e ) = 5 n fi5 q ^, where 

{1, if n = m, 
(Al) 
0, if n ^ m 
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is the Kronecker-delta function, and n and m are integers. Coefficients C\ lt \ 2 are calculated 
in this section. First we show that 

6n > = ^ k\ T(n-k + lY (A2) 

Note that we can consider m = without any loss of generality, since b~ n ^ m = 5 n _ m)0 . 
If n < 0, then the summand in ( 1A2I) is zero by definition, indeed. If n > 0, then 



ytH i = If W (-i)* = o 

1.— n \ / J,— n \ / 



(A3) 



V ("!)* 1 (-!)° 1 ! (A4) 

^ lb! r(-fc + i) o! r(i) ' 1 ; 



k=0 v ' fc=0 

follows from the binomial theorem. Finally, for n = 0, 

o , ^ , ^ ( _ 1)0 l 

k\ v(-k + n 

fc=0 

Coefficients Cai,a 2 can be obtained from the term by term comparison of P Te (ra, ? | r e ) = 
5^Ai a 2 ^Ai,A 2 /( r e) fi'(^) ^(?) with the expansion of the initial condition 5 n 5 9)0 , shown above. 
One can easily confirm with the help of identity /(n)5 n> o = f(0)o~n,o that the same terms 
appear on both sides, if Ai = — hi, and A 2 = — ak 2 , and coefficients Ck lt k 2 are the following: 

r (-i) fcl+fc2 r(r e + i-^) 1 1 

fcl ' fc2 fcjlifea! r(r e -ifei) r(-ajfe 2 )r(l/a-l)' { ' 



Finally, to obtain ( 1151) the summation for k\ can be carried out explicitly: 

» r(r-fc!) = r(r-r e + l) T(r-n) 

^ h\Y(n -h + 1) r(r e - h) T(n + l)T(r e ) T(r-r e -n + 1) 

APPENDIX B: THE a -> LIMIT OF JOINT DISTRIBUTION P T (n,g) 

In this section we prove that the ER limit of the joint probability P T (n, q) is ([161) . 
Theorem Let us consider P r (n, g) as defined in < f73|) . where < q < n < t are integers. 
Then the following limit holds: 

£5^«) = ^£ 1 (- 1 '- I - t *C-i)- (B1 » 
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Proof. First, let us note that <& a (n,q) in (I15p can be rewritten in the following equivalent 

Next, Pochhammer's symbol (1/a — 1) is re- 
l/a q (1 + O («))• After the obvious limits 



form: $ a (n, q) = aYl^^^'^- 1 



k\(q-l-k)\ 

placed with its asymptotic form: — 1) 

have been evaluated the following equation is obtained: 



T + 1 

\]m¥ T (n,q) = — — — km 
q^o rl (n + 3) a^o 



(-l) fc (l- a - Q fc) n _ 1 

2^fc=0 fc!(g-l-fc)! 



Ofl 



-1 



:b2) 



The above limit, by definition, can be substituted with q — 1 order differential at a. — 0, 
if all the lower order derivates of the sum are zero at a = 0. Indeed, 

-1 (-l) fc (l- Q - Q fc) n _ 1 



km 

a->0 



fc=0 



fc!(<J-l-fc)! 



Ofl 



-1 



E 



1 d m 
m! <ia m 

fc=0 

1 cT(l + a 



a: 



afc )n-l 



a=0 



n-1 



k\ (q-l-k)\ 

(_i) fc (_A;-l) r 



m! 



da r 



E 



fc)! 



a=0 fc=0 

wkere tke sum is if m < q — 1 and 1 if m — q — 1. Tkerefore, tke limit can be transformed 
to 

(B3) 



Q = 



Finally, let us consider tke power expansion of Pockkammer's symbol: 



[x 



, _ (l i i , Sm'x k , wkere Sm are tke Stirling numbers of tke first kind. Tke expansion 



formula kas been applied at x = 1 + a and m — n — 1 , wkick implies 

T -I- I 

km P T (n, q) 



tT(ti + 3) 

7+1 

rT(n + 3) 



k=g-l 
n-1 



(9-1)! 



E(-d 



n— 1— fc ri(fc) 



fc=g-l 



da"- 1 
k 

q-1 



a=0 
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